Generalizing Matrix Multiplication for Efficient Computations on Modern Computers

نویسندگان

  • Stanislav G. Sedukhin
  • Marcin Paprzycki
چکیده

Recent advances in computing allow taking new look at matrix multiplication, where the key ideas are: decreasing interest in recursion, development of processors with thousands (potentially millions) of processing units, and influences from the Algebraic Path Problems. In this context, we propose a generalized matrix-matrix multiply-add (MMA) operation and illustrate its usability. Furthermore, we elaborate the interrelation between this generalization and the BLAS standard.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalizing of a High Performance Parallel Strassen Implementation on Distributed Memory MIMD Architectures

Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count from O(n) of the traditional algorithm to O(n), thus designing efficient parallelizing for this algorithm becomes essential. In this paper, we present our generalizing of a parallel Strassen implementation which obtained a very nice performance on an Intel Paragon: faster 20% for n ≈ 1000 and more than 100%...

متن کامل

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Vectorization and Parallelization of Loops in C/C++ Code

Modern computer processors can support parallel execution of a program by using their multicores. Computers can also support vector operations by using their extended SIMD instructions. To make a computer program run faster, the time-consuming loop computations in the program can often be parallelized and vectorized to utilize the capacity of multicores and extended SIMD instructions. In this p...

متن کامل

Fast matrix multiplication techniques based on the Adleman-Lipton model

Abstract. On distributed memory electronic computers, the implementation and association of fast parallel matrix multiplication algorithms has yielded astounding results and insights. In this discourse, we use the tools of molecular biology to demonstrate the theoretical encoding of Strassen’s fast matrix multiplication algorithm with DNA based on an n-moduli set in the residue number system, t...

متن کامل

Developing Tensor Operations with an Underlying Group Structure

Tensor computations frequently involve factoring or decomposing a tensor into a sum of rank-1 tensors (CANDECOMP-PARAFAC, HOSVD, etc.). These decompositions are often considered as different higher-order extensions of the matrix SVD. The HOSVD can be described using the n-mode product, which describes multiplication between a higher-order tensor and a matrix. Generalizing this multiplication le...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011